HRaid: A Flexible Storage-system Simulator

نویسندگان

  • Toni Cortes
  • Jesús Labarta
چکیده

ion Abstraction Function that has it that uses it Disk Controller request data Controller Network deliver request Network Distributor send request Distributor send reply Controller send reply Distributor grab network Controller grab network Distributor release network Controller release network Distributor send data Controller send data Distributor Network deliver request Table 1: Interface between modules. A module can only insert and extract events for itself. All modules must have an entry function that it is called by the event manager whenever an event for that module is triggered. As we can see, modules are quite independent one from the other. This simpli es the programming of new modules as no special care has to be taken to be able to use new modules with older ones. This freedom of connection between modules is what makes HRaid so exible that it can model nearly any kind of storage system. 2.3 Some Examples To clarify how these abstractions can be used, we present a set of storage systems and how they are modeled using HRaid. Heterogeneous Network RAID In this example, we have a cluster of workstations where nodes are connected through a bus-type network. We also have four disks attached one to each node. Among these disks, we have two fast and two slow ones. In this environment (shown in Figure 2), we want to build a network RAID level 5. HRaid allows us to model this storage system and to study the e ect that the load unbalance may have on Node 1 Node 2 Node 3 Node 0 Fast Drive Fast Disk Fast Controller Fast Drive Fast Disk Fast Controller Slow Drive Slow Disk Slow Controller Slow Drive Slow Disk Slow Controller Bus RAID 5 Heterogeneous Network RAID Forwarder SCSI-Bus Forwarder SCSI-Bus Forwarder SCSI-Bus Forwarder SCSI-Bus Figure 2: Heterogeneous Network RAID. the performance. It can also help us to see the e ect of sharing the bus, etc. To model this storage system, we need two distributor modules: one RAID level 5 and one forwarder, which only forwards the request it receives to its attached disk. We also need two network modules: one SCSI bus and one Ethernet bus. And nally, we need the slow and fast controller and disk modules. Network-attached RAID In this second example, we have a networkattached RAID that is part of two independent storage systems. Both hosts share the RAID that it is connected to a shared bus. Figure 3 presents a schema of this environment. To model this environment we need to dene two storage systems (SS1 and SS2) that are connected to a RAID through a bus. Both the bus and the RAID are shared and this is speci ed by giving them a name (SBus and SNAR), which is used in both storage-system de nitions. RAIDs level 10 and 53 Besides the typical RAID levels (0-5), a combination between basic levels has also been proposed. For instance a RAID level 10 is a RAID level 0 where each segment is RAID level 1. Network-attached RAID Fast Drive Fast Disk Fast Controller Fast Drive Fast Disk Fast Controller Fast Drive Slow Disk Slow Controller Fast Drive Slow Disk Slow Controller Bus [SBus] Sequential FS SS1 RAID 4 SCSI-Bus Network-attached RAID [SNAR] Bus [SBus] Sequential FS SS2 Network-attached RAID [SNAR] SBus SNAH Figure 3: Network-attached RAID. This kind of combinations are extremely easy to model in HRaid. We only have to con gure a RAID level 0, where each device system is a RAID level one. In a similar way, we may de ne a RAID level 53 which is a combination between levels 3 and 0. Although this is not necessarily a network example, it shows the exibility of our tool. All pieces can be mixed together to make more complicated system with very little (or none) e ort. 2.4 Implemented Modules So far, we have implemented the most common modules to be able to start using the simulation to perform some research. In this section, we describe them and present the parameters that can be used to modify their behavior. Disk Controller and Drive The disk drive and the controller implemented follow the de nition proposed by Ruemmler and Wilkes [6]. This is a quite detailed model of disks and controller which is currently being used in most disk simulators. In Tables 2 and 3 we present the parameters needed to model a controller and a disk as proposed in the afore mentioned paper. Physical geometry sector size sectors per track tracks per cylinder cylinders Transfer rate revolutions per minute Seek/search overhead track-switch overhead track skew cylinder skew head-movement overhead Table 2: Disk parameters CPU consumption new-command overhead Cache information block size cache size read ahead (Yes/No) Transfer information read fence write fence immediate report (Yes/No) done message size Table 3: Controller parameters Bus The only network that has been implemented so far is a bus-based network. We believe that this is the most general kind of network and will be most useful in many cases. The way we have modeled the bus is a simple, but widely used, one based on latency and bandwidth. The latency models the time needed to start a communication while the bandwidth is used to compute the real time needed to send a message through the bus. This bus only allows one message at a time, which simulates the network contention that can be found in a real net. This is very important as it can make a big di erence if all the disks in a RAID are connected through a BUS, which is the most common case [7]. Sequential File System (or forwarder) This module is basically o ered for users that only want to work with a single disk or a hardware RAID. The only thing it does is to receive CPU consumptionnew-command overheadTransfer information block sizeimmediate report (Yes/No)request message sizeRAID level 1read policyRAID levels 4 & 5parity-computation overheadsmall-write policyTable 4: RAIDx parametersrequests from the trace le and forwards themto the attached device though the network. Wecould not use a single disk without this modulebecause then we would have no way to modelthe bus that connects the host to the controller.RAIDFinally, we have implemented four modulesthat emulate the behavior of RAIDs levels 0,1, 4 and 5 as described by Chen et al. [7]. Itis important to notice that all these RAID candistribute the blocks either over drivers or overother storage systems. It is also very importantto realize that RAIDs do not need to be homo-geneous: we can have a RAID with two fastdisks and four slow ones.The possible parameters used to de ne thebehavior of a RAID are presented in Table 4.Among all of them, I would like to explainthree of them that my not be clear just bytheir name. The rst one is the read policyneeded for RAIDs level 1. This parametertells the module how to chose between thetwo copies of a block in a read operation.The two implemented options are random andshortest-queue rst [8]. For RAIDs level 4 and5 we have also implemented two small-writepolicies: read write modify and regener-ate write [8]. Finally, we have also allowimmediate reporting, which means that thedistributor will send a done message as soonas all data is in its bu ers, even if no data hasalready been physically written.Read Write050100150200250milliseconds0F4S1F3S2F2S3F1S4F0SFigure 4: Read and write performance for aRAID1.2.5 Model ValidationAll modules already implemented have beenvalidated by comparing their results to theones obtained experimentally from real sys-tems. The disk is the only exception as we aretrying to obtain real traces from a real disk.Anyway, we compared it to the simulator al-ready implemented by Kotz [2]. When com-paring both simulators all requests di ered lessthan 0.2%. As this simulator was already val-idated, it also validates our implementation.3 Using HRaid in SimpleExamplesTo illustrate some of the issues that can bestudied with this new tool, we present two sim-ple, but interesting, examples. Figures 4 and 5present the performance of a RAID1 and aRAID5 con gurations while varying the kindof disks used to build them. In both examples,we used two kinds of disks: slow and fast. Thefast disk was nearly twice as fast as the slowone. Each bar in the graph represents a RAIDcon guration varying the number of fast disks(F) and slow ones (S).In Figure 4, we can see that in a RAID level1, changing slow disks by fast ones, only makessense if you change half of them. In that case,read operation are improved quite a lot as thefastest disks are used for reading. On the otherhand, write operations are not improved bychanging slow disks by fast ones because all Read Write050100150200milliseconds0F4S1F3S2F2S3F1S4F0SFigure 5: Read and write performance for aRAID5 (regenerate-write for small writes).disks are needed to perform a write operation.Actually, the performance is decreased. Thishappens because fast disks tend to use the net-work more greedily while the slow ones, whichare the bottleneck, have to wait for the net-work. This increases their response time, de-creasing the overall write performance.The results of performing the same experi-ment in a RAID level 5 is shown in Figure 5.On read operations we can see the e ect of net-work contention as explained for RAIDs level1. On the other hand, write operations are im-proved when fast disks are placed because allthe small writes that can be done on only fastdisks, are done much faster.4 ConclusionsIn this paper, we have presented a new toolaimed at simplifying the task of researchingon storage systems for clusters of workstations.We have presented its main abstractions, itsfunctionality and some simple examples of howit can be used.AcknowledgmentsI would like to thank David Kotz for allowingus to use his disk simulator. We learned a lotby examining and using it.References[1] W. V. Coutright, G. Gibson, M. Holland,and J. Zelanka. A structured approachto redundant disk array implementation.In Proceedings of the International Com-puter Performance and Dependability Sym-posium, September 1996.[2] D. Kotz, S. B. Toh, and S. Radhakrishnan.A detailed simulation model of the HP-97560 disk drive. Technical Report PCS-TR94-220, Department of Computer Sci-ence, Dartmouth College, July 1994.[3] G. R. Ganger, B. L. Worthington, andY. N. Patt. The DiskSim Simulation En-vironment. Version 1.0 Reference Manual.Department of Electrical Engineering andComputer Science, University of Michigan,February 1998. (Technical report numberCSE-TR-385-98).[4] P. F. Corbett and D. G. Feitelson. TheVesta parallel le system. ACM Transac-tion on Computer Systems, 14(3):225{264,August 1996.[5] S. A. Moyer and V. S. Sunderam. PIOUS:A scalable parallel I/O system for dis-tributed computing environment. In Pro-ceedings of the Scalable High PerformanceComputing Conference, pages 71{79, 1994.[6] Chris Ruemmler and John Wilkes. An in-troducction to disk drive modeling. IEEECOMPUTER, pages 17{28, March 1994.[7] P. M. Chen, E. K. Lee, G. A. Gibson, R. H.Katz, and D. A. Patterson. RAID: High-performance and reliable secondary stor-age. ACM Computing Surveys, 26(2):145{185, 1994.[8] S. Chen and D. Towsley. A performanceevaluation of RAID architectures. IEEETransactions on Computers, 45(10):1116{1130, October 1996.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Optimal Sizing of Energy Storage System in A Renewable-Based Microgrid Under Flexible Demand Side Management Considering Reliability and Uncertainties

Utilization of energy storage system (ESS) in microgrids has turned to be necessary in recent years and now with the improvement of storage technologies, system operators are looking for an exact modeling and calculation for optimal sizing of ESS. In the proposed paper, optimal size of ESS is determined in a microgrid considering demand response program (DRP) and reliability criterion. Both lar...

متن کامل

Hierarchical RAID: Design, performance, reliability, and recovery

Hierarchical RAID (HRAID) extends the RAID paradigm to mask the failure of whole Storage Nodes (SNs) or bricks, where each SN is a disk array with a certain RAID level. HRAIDk/l with N SNs and M disks per SN tolerates k SN failures and l disk failures per SN withMaximum Distance Separable (MDS) erasure codes, which introduce the minimum level of redundancy at each level. For N = M there are k i...

متن کامل

A Scheduling Model of Flexible Manufacturing System to Reduce Waste and Earliness/Tardiness Penalties

Nowadays, flexible manufacturing system (FMS) is introduced as a response to the competitive environment. Scheduling of FMS is more complex and more difficult than the other scheduling production systems. One of the main factors in scheduling of FMS is variable time of taking orders from customers, which leads to a sudden change in the manufacturing process. Also, some problems are created in p...

متن کامل

Real-time Scheduling of a Flexible Manufacturing System using a Two-phase Machine Learning Algorithm

The static and analytic scheduling approach is very difficult to follow and is not always applicable in real-time. Most of the scheduling algorithms are designed to be established in offline environment. However, we are challenged with three characteristics in real cases: First, problem data of jobs are not known in advance. Second, most of the shop’s parameters tend to be stochastic. Third, th...

متن کامل

Simultaneous Control of Active and Reactive Powers of Vanadium Redox Flow Battery Systems in Flexible Microgrids

This paper discusses the control of flexible microgrids, consisting of a Redox Flow Batteries (RFB) and a new power conditioning system (PCS) for the RFB. Considering the importance of energy storage, this study is essential in power systems that are developed cautiously. RFB is connected to power system by a DC/DC or DC/AC converter to produce a DC voltage. It is very important that this conve...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999